Goto

Collaborating Authors

 atmospheric state


Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation

Andry, Gérôme, Lewin, Sacha, Rozet, François, Rochman, Omer, Mangeleer, Victor, Pirlet, Matthias, Faulx, Elise, Grégoire, Marilaure, Louppe, Gilles

arXiv.org Artificial Intelligence

Deep learning has advanced weather forecasting, but accurate predictions first require identifying the current state of the atmosphere from observational data. In this work, we introduce Appa, a score-based data assimilation model generating global atmospheric trajectories at 0.25\si{\degree} resolution and 1-hour intervals. Powered by a 565M-parameter latent diffusion model trained on ERA5, Appa can be conditioned on arbitrary observations to infer plausible trajectories, without retraining. Our probabilistic framework handles reanalysis, filtering, and forecasting, within a single model, producing physically consistent reconstructions from various inputs. Results establish latent score-based data assimilation as a promising foundation for future global atmospheric modeling systems.


MODS: Multi-source Observations Conditional Diffusion Model for Meteorological State Downscaling

Tu, Siwei, Xu, Jingyi, Yang, Weidong, Bai, Lei, Fei, Ben

arXiv.org Artificial Intelligence

Accurate acquisition of high-resolution surface meteorological conditions is critical for forecasting and simulating meteorological variables. Directly applying spatial interpolation methods to derive meteorological values at specific locations from low-resolution grid fields often yields results that deviate significantly from the actual conditions. Existing downscaling methods primarily rely on the coupling relationship between geostationary satellites and ERA5 variables as a condition. However, using brightness temperature data from geostationary satellites alone fails to comprehensively capture all the changes in meteorological variables in ERA5 maps. To address this limitation, we can use a wider range of satellite data to make more full use of its inversion effects on various meteorological variables, thus producing more realistic results across different meteorological variables. To further improve the accuracy of downscaling meteorological variables at any location, we propose the Multi-source Observation Down-Scaling Model (MODS). It is a conditional diffusion model that fuses data from multiple geostationary satellites GridSat, polar-orbiting satellites (AMSU-A, HIRS, and MHS), and topographic data (GEBCO), as conditions, and is pre-trained on the ERA5 reanalysis dataset. During training, latent features from diverse conditional inputs are extracted separately and fused into ERA5 maps via a multi-source cross-attention module. By exploiting the inversion relationships between reanalysis data and multi-source atmospheric variables, MODS generates atmospheric states that align more closely with real-world conditions. During sampling, MODS enhances downscaling consistency by incorporating low-resolution ERA5 maps and station-level meteorological data as guidance. Experimental results demonstrate that MODS achieves higher fidelity when downscaling ERA5 maps to a 6.25 km resolution.


Generative assimilation and prediction for weather and climate

Yang, Shangshang, Nai, Congyi, Liu, Xinyan, Li, Weidong, Chao, Jie, Wang, Jingnan, Wang, Leyi, Li, Xichen, Chen, Xi, Lu, Bo, Xiao, Ziniu, Boers, Niklas, Yuan, Huiling, Pan, Baoxiang

arXiv.org Artificial Intelligence

Machine learning models have shown great success in predicting weather up to two weeks ahead, outperforming process-based benchmarks. However, existing approaches mostly focus on the prediction task, and do not incorporate the necessary data assimilation. Moreover, these models suffer from error accumulation in long roll-outs, limiting their applicability to seasonal predictions or climate projections. Here, we introduce Generative Assimilation and Prediction (GAP), a unified deep generative framework for assimilation and prediction of both weather and climate. By learning to quantify the probabilistic distribution of atmospheric states under observational, predictive, and external forcing constraints, GAP excels in a broad range of weather-climate related tasks, including data assimilation, seamless prediction, and climate simulation. In particular, GAP is competitive with state-of-the-art ensemble assimilation, probabilistic weather forecast and seasonal prediction, yields stable millennial simulations, and reproduces climate variability from daily to decadal time scales.


Generative Data Assimilation of Sparse Weather Station Observations at Kilometer Scales

Manshausen, Peter, Cohen, Yair, Pathak, Jaideep, Pritchard, Mike, Garg, Piyush, Mardani, Morteza, Kashinath, Karthik, Byrne, Simon, Brenowitz, Noah

arXiv.org Artificial Intelligence

Data assimilation of observational data into full atmospheric states is essential for weather forecast model initialization. Recently, methods for deep generative data assimilation have been proposed which allow for using new input data without retraining the model. They could also dramatically accelerate the costly data assimilation process used in operational regional weather models. Here, in a central US testbed, we demonstrate the viability of score-based data assimilation in the context of realistically complex km-scale weather. We train an unconditional diffusion model to generate snapshots of a state-of-the-art km-scale analysis product, the High Resolution Rapid Refresh. Then, using score-based data assimilation to incorporate sparse weather station data, the model produces maps of precipitation and surface winds. The generated fields display physically plausible structures, such as gust fronts, and sensitivity tests confirm learnt physics through multivariate relationships. Preliminary skill analysis shows the approach already outperforms a naive baseline of the High-Resolution Rapid Refresh system itself. By incorporating observations from 40 weather stations, 10\% lower RMSEs on left-out stations are attained. Despite some lingering imperfections such as insufficiently disperse ensemble DA estimates, we find the results overall an encouraging proof of concept, and the first at km-scale. It is a ripe time to explore extensions that combine increasingly ambitious regional state generators with an increasing set of in situ, ground-based, and satellite remote sensing data streams.


Explainable Graph Neural Networks for Observation Impact Analysis in Atmospheric State Estimation

Jeon, Hyeon-Ju, Kang, Jeon-Ho, Kwon, In-Hyuk, Lee, O-Joun

arXiv.org Artificial Intelligence

Weather forecasting, a critical component in industries like transportation and manufacturing, relies heavily on Numerical Weather Prediction (NWP) systems, which are based on 3D physical models and dynamical equations [1, 2]. For NWP systems to predict future atmospheric states effectively, they require accurate current atmospheric states as initial values. This necessity underscores the importance of a data assimilation (DA) system, which approximates the true atmospheric states by merging observations with prediction results from dynamical models [3]. The integration of a wide range of observations, from sources like aircraft, radiosondes, and satellites, is crucial for enhancing the DA system's accuracy [4]. Traditional methods to assess the impact of observations on weather forecasts include forecast sensitivity to observation (FSO) and its variations, such as ensemble FSO and hybrid FSO [2, 5, 6].


CloudNine: Analyzing Meteorological Observation Impact on Weather Prediction Using Explainable Graph Neural Networks

Jeon, Hyeon-Ju, Kang, Jeon-Ho, Kwon, In-Hyuk, Lee, O-Joun

arXiv.org Artificial Intelligence

The impact of meteorological observations on weather forecasting varies with sensor type, location, time, and other environmental factors. Thus, quantitative analysis of observation impacts is crucial for effective and efficient development of weather forecasting systems. However, the existing impact analysis methods are difficult to be widely applied due to their high dependencies on specific forecasting systems. Also, they cannot provide observation impacts at multiple spatio-temporal scales, only global impacts of observation types. To address these issues, we present a novel system called ``CloudNine,'' which allows analysis of individual observations' impacts on specific predictions based on explainable graph neural networks (XGNNs). Combining an XGNN-based atmospheric state estimation model with a numerical weather prediction model, we provide a web application to search for observations in the 3D space of the Earth system and to visualize the impact of individual observations on predictions in specific spatial regions and time periods.


AtmoDist: Self-supervised Representation Learning for Atmospheric Dynamics

Hoffmann, Sebastian, Lessig, Christian

arXiv.org Artificial Intelligence

Representation learning has proven to be a powerful methodology in a wide variety of machine learning applications. For atmospheric dynamics, however, it has so far not been considered, arguably due to the lack of large-scale, labeled datasets that could be used for training. In this work, we show that the difficulty is benign and introduce a self-supervised learning task that defines a categorical loss for a wide variety of unlabeled atmospheric datasets. Specifically, we train a neural network on the simple yet intricate task of predicting the temporal distance between atmospheric fields from distinct but nearby times. We demonstrate that training with this task on ERA5 reanalysis leads to internal representations capturing intrinsic aspects of atmospheric dynamics. We do so by introducing a data-driven distance metric for atmospheric states. When employed as a loss function in other machine learning applications, this Atmodist distance leads to improved results compared to the classical $\ell_2$-loss. For example, for downscaling one obtains higher resolution fields that match the true statistics more closely than previous approaches and for the interpolation of missing or occluded data the AtmoDist distance leads to results that contain more realistic fine scale features. Since it is derived from observational data, AtmoDist also provides a novel perspective on atmospheric predictability.